The Netflix Prize: Alternating Least Squares in MPI

نویسنده

  • Sean Harnett
چکیده

In 2006 Netflix announced a million dollar prize to the first team that could beat their Cinematch recommendation system by 10% on a particular test data set. Specifically, given over 100 million ratings, 1-5, from 480,189 different users and 17,770 different movies, the goal was to produce predictions for the test set that minimize the root mean square error. Cinematch scored an rmse of .9525, so the goal was to score less than or equal to .8572. My goal in this project was to simply beat Cinematch, in parallel. The prize was won in September, 2009 by a team using a blend of many different techniques. One such technique that played a large part in the winning blend was matrix factorization. The 480,000x17,000 (sparse) ratings matrix, R, is approximated by a product of two much smaller matrices. A singular value decomposition can produce the two matrices and conveniently minimizes the square norm. After choosing a number of “features”, f, you want an fx480,000 user matrix U and an fx17,000 movie matrix M whose product, UM comes as close as possible to the given training data in R. A number of algorithms exist to find these two matrices; I used an approach called alternating least squares with weighted-λ-regularization. [1]

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large-Scale Parallel Collaborative Filtering for the Netflix Prize

Many recommendation systems suggest items to users by utilizing the techniques of collaborative filtering (CF) based on historical records of items that the users have viewed, purchased, or rated. Two major problems that most CF approaches have to resolve are scalability and sparseness of the user profiles. In this paper, we describe Alternating-Least-Squares with Weighted-λ-Regularization (ALS...

متن کامل

Statistical Properties of Alternating Least Squares Estimators of a Collaborative Filtering Model

Recommender systems are emerging as important tools for improving customer satisfaction by mathematically predicting user preferences. Several major corporations including Amazon.com and Pandora use these types of systems to suggest additional options based on current or recent purchases. Netflix uses a recommender system to provide its customers with suggestions for movies that they may like, ...

متن کامل

The Netflix Prize High Performance Computing Neural Networks Final Report

A solution for the Netflix Prize was developed based on back propagation neural networks. The solution is different than most other Collaborative Filtering techniques in that rather than perform a global dimensionality reduction, this method focuses on each desired prediction by creating an entirely new neural network for each prediction. The implementation was parallelized using MPI, achieving...

متن کامل

P-Tree Singular Value Decomposition Item-Feature Collaborative Filtering Algorithm for Netflix Prize

Collaborative Filtering is effective to provide customers with personalized recommendations by analyzing the purchase pattens. Matrix factorization, e.g. Singular Value Decomposition, is another successful technique in recommendation system. We implemented Singular Value Decomposition algorithm to achieve the least total squared errors. Based on the result, item-feature Collaborative Filtering ...

متن کامل

Parallel stochastic gradient algorithms for large-scale matrix completion

This paper develops Jellyfish, an algorithm for solving data-processing problems with matrix-valued decision variables regularized to have low rank. Particular examples of problems solvable by Jellyfish include matrix completion problems and least-squares problems regularized by the nuclear norm or γ2-norm. Jellyfish implements a projected incremental gradient method with a biased, random order...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010